Increasing Return on Annotation Investment: The Automatic Construction of a Universal Dependency Treebank for Dutch

نویسندگان

  • Gosse Bouma
  • Gertjan van Noord
چکیده

We present a method for automatically converting the Dutch Lassy Small treebank, a phrasal dependency treebank, to UD. All of the information required to produce accurate UD annotation appears to be available in the underlying annotation. However, we also note that the close connection between POS-tags and dependency labels that is present in UD is missing in the Lassy treebanks. As a consequence, annotation decisions in the Dutch data for such phenomena as nominalization and clausal complements of prepositions seem to differ to some extent from comparable data in English and German. Because the conversion is automatic, we can now also compare three state-of-theart dependency parsers trained on UD Lassy Small with Alpino, a hybrid Dutch parser which produces output that is compatible with the original Lassy annotations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

Automatic Conversion of the Basque Dependency Treebank to Universal Dependencies

This work describes the process of automatically converting the Basque Dependency Treebank to Universal Dependencies (UD). Our objective is to develop a set of conversion rules that will automatically transform the original treebank to UD. Basque is a morphologically rich and agglutinative language, which presents different challenges for the conversion from the initial annotation scheme to UD....

متن کامل

Annotating a parallel monolingual treebank with semantic similarity relations

We describe an ongoing effort to build a large-scale parallel and comparable monolingual treebank for Dutch of 1 million words, where nodes of dependency trees are aligned and labeled according to a limited set of semantic similarity relations. We address alignment of sentences and dependency trees, both manual and automatic. We introduce new annotation tools, present results from pilot experim...

متن کامل

Annotating a parallel monolingual treebank with semantic similarity relations

We describe an ongoing effort to build a large-scale parallel/comparable monolingual treebank for Dutch of 1 million words, where nodes of dependency trees are aligned and labeled according to a limited set of semantic similarity relations. We address alignment of sentences and dependency trees, both manual and automatic. We introduce new annotation tools, present results from pilot experiments...

متن کامل

DCU 250 Arabic Dependency Bank: An LFG Gold Standard Resource for the Arabic Penn Treebank

This paper describes the construction of a dependency bank gold standard for Arabic, DCU 250 Arabic Dependency Bank (DCU 250), based on the Arabic Penn Treebank Corpus (ATB) (Bies and Maamouri, 2003; Maamouri and Bies, 2004) within the theoretical framework of Lexical Functional Grammar (LFG). For parsing and automatically extracting grammatical and lexical resources from treebanks, it is neces...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017